Introduction

Column

Basic Info

An Analysis of the Correlation Between NBA Statistics

The purpose of this study is to analyze the National Basketball Association and how different statistics, both individual and team, affect each other. In addition to some of the more basic statistics, such as points and assists, I will also study more advanced statistics that have become more significant as NBA teams become more data-oriented. These statistics guide NBA teams and players daily, therefore having an enhanced knowledge on the impact they have can give people important insight into why they make the decisions they do.

I will be using 4 main datasets throughout this presentation (some of which have been combined from more datasets to add in several variables such as player countries, heights, weights, etc.). The player statistics datasets I will be using are based off of the 2020-21 NBA season. They contain data for 481 NBA players and include their per game stats and advanced stats. The last dataset I will be using is for team statistics during the 2020-21 season.

This presentation is tasked with answering some of the following questions:

  • What positions tend to have better stats in each of the main statistical categories?

  • How does height/weight affect offensive and defensive stats in general (could use OBPM, DBPM, or basic per game stats), if at all?

  • How does a team’s payroll affect their overall performance?

Glimpse of Per Game

Rows: 481
Columns: 33
$ Player  <chr> "Precious Achiuwa", "Jaylen Adams", "Steven Adams", "Bam Adeba…
$ Pos     <fct> PF, PG, C, C, C, SG, SG, SG, C, PF, PF, PF, PF, SF, PF, PG, SF…
$ Age     <dbl> 21, 24, 27, 23, 35, 22, 22, 25, 22, 30, 27, 26, 23, 28, 36, 20…
$ Tm      <chr> "MIA", "MIL", "NOP", "MIA", "SAS", "PHO", "NOP", "MEM", "TOT",…
$ G       <dbl> 28, 6, 27, 26, 18, 3, 23, 19, 28, 2, 24, 27, 1, 18, 27, 25, 18…
$ GS      <dbl> 2, 0, 27, 26, 18, 0, 3, 8, 10, 0, 24, 27, 0, 0, 3, 17, 18, 0, …
$ MP      <dbl> 14.6, 2.8, 28.1, 33.6, 26.7, 2.7, 19.2, 23.9, 26.2, 8.0, 28.1,…
$ FG      <dbl> 2.6, 0.2, 3.5, 7.4, 5.9, 0.0, 3.3, 3.2, 4.4, 0.5, 5.0, 10.3, 0…
$ FGA     <dbl> 4.4, 1.3, 5.8, 12.9, 12.5, 1.0, 8.2, 7.4, 6.8, 0.5, 10.7, 18.4…
$ `FG%`   <dbl> 0.590, 0.125, 0.603, 0.573, 0.476, 0.000, 0.410, 0.429, 0.642,…
$ `3P`    <dbl> 0.0, 0.0, 0.0, 0.1, 1.3, 0.0, 1.0, 2.3, 0.0, 0.5, 1.7, 1.1, 0.…
$ `3PA`   <dbl> 0.0, 0.3, 0.0, 0.2, 3.7, 0.3, 3.8, 5.3, 0.1, 0.5, 4.3, 4.0, 0.…
$ `3P%`   <dbl> 0.000, 0.000, 0.000, 0.400, 0.358, 0.000, 0.276, 0.436, 0.250,…
$ `2P`    <dbl> 2.6, 0.2, 3.5, 7.3, 4.6, 0.0, 2.3, 0.8, 4.3, 0.0, 3.3, 9.2, 0.…
$ `2PA`   <dbl> 4.4, 1.0, 5.7, 12.7, 8.8, 0.7, 4.4, 2.1, 6.6, 0.0, 6.4, 14.4, …
$ `2P%`   <dbl> 0.590, 0.167, 0.606, 0.576, 0.525, 0.000, 0.525, 0.410, 0.651,…
$ `eFG%`  <dbl> 0.590, 0.125, 0.603, 0.576, 0.529, 0.000, 0.473, 0.586, 0.645,…
$ FT      <dbl> 1.3, 0.0, 1.1, 5.1, 0.9, 0.0, 1.1, 1.7, 3.6, 0.0, 2.1, 6.4, 0.…
$ FTA     <dbl> 2.4, 0.0, 2.3, 6.0, 1.2, 0.0, 1.4, 1.9, 4.7, 1.0, 2.7, 9.9, 0.…
$ `FT%`   <dbl> 0.561, 0.000, 0.468, 0.841, 0.762, 0.000, 0.781, 0.892, 0.758,…
$ ORB     <dbl> 1.3, 0.0, 4.3, 1.9, 0.8, 0.0, 0.2, 0.4, 2.9, 0.5, 0.9, 1.7, 0.…
$ DRB     <dbl> 2.7, 0.5, 4.6, 7.3, 3.5, 0.3, 2.4, 2.5, 6.1, 1.5, 5.3, 9.7, 4.…
$ TRB     <dbl> 4.0, 0.5, 8.9, 9.2, 4.3, 0.3, 2.7, 2.9, 9.0, 2.0, 6.3, 11.4, 4…
$ AST     <dbl> 0.6, 0.3, 2.1, 5.3, 1.9, 0.3, 2.0, 2.1, 1.6, 1.0, 3.8, 5.8, 0.…
$ STL     <dbl> 0.4, 0.0, 1.0, 1.0, 0.4, 0.0, 1.1, 1.0, 0.5, 0.0, 1.1, 1.3, 0.…
$ BLK     <dbl> 0.5, 0.0, 0.6, 1.0, 0.9, 0.0, 0.3, 0.2, 1.6, 0.0, 0.8, 1.3, 2.…
$ TOV     <dbl> 1.0, 0.0, 1.7, 3.0, 0.9, 0.0, 1.3, 1.1, 1.5, 1.0, 1.4, 3.7, 2.…
$ PF      <dbl> 1.9, 0.2, 1.9, 2.6, 1.5, 0.3, 1.7, 1.3, 1.6, 0.0, 1.8, 3.1, 1.…
$ PTS     <dbl> 6.5, 0.3, 8.0, 19.9, 14.1, 0.0, 8.8, 10.4, 12.3, 1.5, 13.8, 28…
$ COUNTRY <chr> "Nigeria", NA, "New Zealand", "USA", "USA", NA, "Canada", "USA…
$ salary  <int> 2582160, 449115, 29592695, 5115492, 17628340, 449115, 3113160,…
$ height  <dbl> 80, 74, 84, 82, 83, 75, 77, 77, 83, 81, 81, 81, 82, 79, 80, 74…
$ weight  <dbl> 225, 190, 255, 255, 245, 195, 205, 198, 234, 215, 230, 205, 20…

Glimpse of Advanced

Rows: 481
Columns: 28
$ Player  <chr> "Precious Achiuwa", "Jaylen Adams", "Steven Adams", "Bam Adeba…
$ Pos     <chr> "PF", "PG", "C", "C", "C", "SG", "SG", "SG", "C", "PF", "PF", …
$ Age     <dbl> 21, 24, 27, 23, 35, 22, 22, 25, 22, 30, 27, 26, 23, 28, 36, 20…
$ Tm      <chr> "MIA", "MIL", "NOP", "MIA", "SAS", "PHO", "NOP", "MEM", "TOT",…
$ G       <dbl> 28, 6, 27, 26, 18, 3, 23, 19, 28, 2, 24, 27, 1, 18, 27, 25, 18…
$ MP      <dbl> 408, 17, 760, 873, 480, 8, 441, 454, 734, 16, 675, 906, 8, 149…
$ PER     <dbl> 15.1, -6.9, 15.9, 22.7, 15.2, -11.9, 12.0, 14.0, 22.5, 7.5, 17…
$ `TS%`   <dbl> 0.599, 0.125, 0.592, 0.641, 0.542, 0.000, 0.502, 0.630, 0.695,…
$ `3PAr`  <dbl> 0.000, 0.250, 0.006, 0.015, 0.298, 0.333, 0.463, 0.721, 0.021,…
$ FTr     <dbl> 0.541, 0.000, 0.397, 0.469, 0.093, 0.000, 0.170, 0.264, 0.695,…
$ `ORB%`  <dbl> 10.5, 0.0, 16.9, 6.8, 3.2, 0.0, 1.3, 1.7, 12.6, 6.2, 3.5, 5.6,…
$ `DRB%`  <dbl> 19.8, 18.2, 18.0, 23.2, 14.0, 13.6, 14.1, 12.0, 25.5, 20.3, 21…
$ `TRB%`  <dbl> 15.4, 9.4, 17.5, 15.4, 8.4, 6.9, 7.7, 6.7, 19.1, 13.0, 12.2, 1…
$ `AST%`  <dbl> 6.8, 13.4, 10.1, 27.9, 11.4, 14.7, 14.9, 11.5, 9.0, 16.7, 19.4…
$ `STL%`  <dbl> 1.4, 0.0, 1.7, 1.4, 0.7, 0.0, 2.8, 2.0, 0.9, 0.0, 1.9, 1.8, 0.…
$ `BLK%`  <dbl> 3.8, 0.0, 2.0, 3.2, 2.8, 0.0, 1.9, 0.6, 5.5, 0.0, 2.5, 3.5, 21…
$ `TOV%`  <dbl> 16.1, 0.0, 20.1, 16.2, 6.4, 0.0, 12.9, 11.3, 14.8, 51.5, 10.7,…
$ `USG%`  <dbl> 19.7, 19.7, 12.8, 24.6, 22.3, 16.8, 22.4, 16.5, 17.1, 10.3, 20…
$ OWS     <dbl> 0.3, -0.1, 1.2, 2.3, 0.2, -0.1, -0.2, 0.7, 2.3, 0.0, 1.1, 2.7,…
$ DWS     <dbl> 0.6, 0.0, 0.5, 1.3, 0.5, 0.0, 0.4, 0.4, 0.8, 0.0, 0.8, 1.5, 0.…
$ WS      <dbl> 0.9, -0.1, 1.7, 3.6, 0.7, -0.1, 0.2, 1.1, 3.1, 0.0, 1.9, 4.3, …
$ `WS/48` <dbl> 0.101, -0.265, 0.109, 0.196, 0.075, -0.327, 0.025, 0.113, 0.20…
$ OBPM    <dbl> -2.8, -15.6, -0.1, 2.9, 0.3, -16.4, -2.6, 0.4, 2.3, -3.4, 1.9,…
$ DBPM    <dbl> -0.2, -5.2, -1.0, 2.0, -1.0, -4.8, 0.1, 0.1, 0.4, 0.1, 1.1, 2.…
$ BPM     <dbl> -3.0, -20.9, -1.1, 4.9, -0.7, -21.2, -2.5, 0.5, 2.7, -3.3, 2.9…
$ VORP    <dbl> -0.1, -0.1, 0.2, 1.5, 0.2, 0.0, -0.1, 0.3, 0.9, 0.0, 0.8, 2.1,…
$ salary  <int> 2582160, 449115, 29592695, 5115492, 17628340, 449115, 3113160,…
$ MPG     <dbl> 14.6, 2.8, 28.1, 33.6, 26.7, 2.7, 19.2, 23.9, 26.2, 8.0, 28.1,…

Glimpse of Team Stats

Rows: 30
Columns: 28
$ team    <chr> "Phoenix Suns", "Golden State Warriors", "Memphis Grizzlies", …
$ GP      <dbl> 52, 53, 55, 54, 53, 55, 54, 53, 53, 54, 51, 53, 53, 55, 53, 54…
$ W       <dbl> 42, 40, 37, 34, 33, 34, 33, 32, 32, 31, 28, 29, 29, 30, 28, 28…
$ L       <dbl> 10, 13, 18, 20, 20, 21, 21, 21, 21, 23, 23, 24, 24, 25, 25, 26…
$ `WIN%`  <dbl> 0.808, 0.755, 0.673, 0.630, 0.623, 0.618, 0.611, 0.604, 0.604,…
$ MIN     <dbl> 48.1, 48.2, 48.3, 48.5, 48.1, 48.2, 48.0, 48.4, 48.0, 48.3, 48…
$ PTS     <dbl> 112.7, 110.9, 112.7, 108.7, 111.6, 112.7, 106.5, 107.8, 113.6,…
$ FGM     <dbl> 42.7, 40.4, 42.7, 39.3, 41.6, 40.7, 39.5, 39.6, 40.6, 39.1, 40…
$ FGA     <dbl> 89.4, 86.5, 93.4, 85.7, 87.0, 88.9, 85.1, 85.1, 85.9, 86.4, 91…
$ `FG%`   <dbl> 47.8, 46.7, 45.7, 45.9, 47.8, 45.8, 46.4, 46.6, 47.3, 45.3, 44…
$ `3PM`   <dbl> 11.5, 14.6, 11.1, 13.5, 11.2, 14.3, 11.8, 11.0, 14.6, 12.3, 12…
$ `3PA`   <dbl> 31.7, 40.1, 32.7, 36.1, 30.0, 39.4, 33.7, 30.9, 40.0, 36.8, 34…
$ `3P%`   <dbl> 36.3, 36.4, 33.9, 37.5, 37.2, 36.4, 35.1, 35.8, 36.4, 33.5, 35…
$ FTM     <dbl> 15.8, 15.5, 16.2, 16.5, 17.2, 16.9, 15.7, 17.5, 17.8, 15.6, 15…
$ FTA     <dbl> 20.0, 20.3, 22.0, 20.2, 21.2, 21.6, 20.9, 21.7, 22.9, 20.2, 20…
$ `FT%`   <dbl> 79.1, 76.4, 73.7, 81.5, 81.4, 78.2, 75.1, 80.9, 77.8, 77.0, 75…
$ OREB    <dbl> 10.2, 10.1, 13.6, 10.8, 8.9, 10.3, 10.4, 8.4, 10.1, 9.5, 13.2,…
$ DREB    <dbl> 35.9, 36.4, 35.0, 33.8, 34.1, 36.5, 34.9, 33.7, 35.7, 34.3, 31…
$ REB     <dbl> 46.1, 46.5, 48.6, 44.6, 43.0, 46.8, 45.3, 42.1, 45.8, 43.8, 45…
$ AST     <dbl> 26.5, 27.5, 25.1, 25.9, 24.5, 23.4, 25.5, 23.2, 22.2, 24.0, 22…
$ TOV     <dbl> 13.3, 15.6, 13.3, 14.9, 13.0, 13.7, 14.9, 12.5, 14.3, 12.6, 12…
$ STL     <dbl> 8.6, 9.4, 10.1, 7.6, 7.2, 7.7, 7.2, 7.6, 7.1, 7.1, 9.2, 7.0, 7…
$ BLK     <dbl> 4.3, 4.9, 6.4, 3.3, 4.6, 4.2, 4.3, 5.7, 4.8, 4.1, 4.9, 5.5, 3.…
$ BLKA    <dbl> 4.0, 4.1, 6.4, 4.4, 5.2, 4.5, 4.5, 4.6, 4.2, 3.9, 5.1, 5.2, 4.…
$ PF      <dbl> 19.3, 20.3, 19.1, 20.5, 18.8, 17.8, 17.0, 19.1, 18.8, 19.7, 19…
$ PFD     <dbl> 19.3, 17.7, 19.0, 20.0, 17.8, 19.2, 19.2, 18.9, 20.1, 19.9, 18…
$ `+/-`   <dbl> 7.8, 8.3, 4.1, 4.2, 1.7, 4.0, 4.4, 2.2, 6.0, 2.7, 1.3, 0.5, 1.…
$ payroll <int> 128858241, 171105334, 132022601, 134731235, 128963580, 1366239…

Column

Explanation of Variables

Player Overview

Column

Per Game Table

Advanced Table

Team Stats Table

Birthplaces

Birthplaces (filtered)

In this heatmap, I have filtered out all of the players from the US so that we can better see the number of players from other countries.

Height

Column

Points

Removed under 10 ppg because every height had a lot of entries with low ppg

Assists

Rebounds

Blocks

Steals

FG%

Column

Height Analysis

Weight

Column

Points

Assists

Rebounds

Blocks

Steals

FG%

Column

Weight Analysis

Position Analysis

Column

Points

Assists

Rebounds

Steals

Blocks

FG%

Avg. Salary

used over 10ppg - didn’t want to bring it down with people who don’t actually play

Column

Analysis

Team Analysis

Column

Payroll vs. Wins

PPG vs. Wins

3P% vs. Wins

Column

Analysis

Advanced Stats

Column

Highest WS

TS% vs. PER

WS/48 vs. Minutes

Column

Analysis

decided to break down into top ws because it ranks overall play and takes into account minutes played - wanted to get rid of people who don’t play much

Conclusion

Column

Results

Limitations

One of the major limitations of this study is that the best dataset I could find was from the 2020-21 NBA season. There were other NBA datasets that I could have used, yet this was the only one that included the advanced stats that many analysts look at today. While this is still recent enough to provide insight on the NBA today, one more limitation is that the dataset was created in the middle of the 20-21 season. As a result, the sample size for the players’ statistics is only part of a season.

About the Author

Column {data-width = 650}

About Me

My name is Christopher Bussen and I am an undergraduate student at the University of Dayton. I am currently working towards my B.S. in Computer Science with minors in Mathematics and Data Analytics and am on track to graduate in May 2024.

After graduation, I am interested in pursuing full-time employment in a data analytics position, especially one that allows me to combine my love of sports and math.

I have exposure to Google Analytics, SPSS, SQL, Golang, Tableau, pandas, and Git, and I am proficient in Java, Python, R, HTML, CSS, and MS 365 applications.

Please connect with me on LinkedIn here.

Column {.tabset data-width = 600}

Picture

Christopher Bussen

Christopher Bussen

---
title: "NBA Statistical Analysis"
output: 
  flexdashboard::flex_dashboard:
    theme:
      bootswatch: materia
      primary: "#F54242"
      secondary: "#2196f3"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

<style>
.chart-title {  /* chart_title  */
   font-size: 20px;
  }
body{ /* Normal  */
      font-size: 16px;
  }
</style>

```{css color tabs}
/* Set font color of inactive tab to black */
.nav-tabs-custom .nav-tabs > li > a 
  {
    color: #black;
  } 

/* Set font color of active tab to blue */
.nav-tabs-custom .nav-tabs > li.active > a 
  {
    color: #2196f3;
  } 

/* To set color on hover */
.nav-tabs-custom .nav-tabs > li.active > a:hover 
  {
    color: grey;
  }

<style type="text/css"> .sidebar 
  { 
    overflow: auto; 
  } 
</style>
```

```{r setup, include=FALSE}
library(flexdashboard)
```

```{r data/packages}
library(pacman)
p_load(tidyverse, maps, viridis, plotly, DT, gridExtra)

nba_advanced <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/nba2021_advanced.csv")
nba_advanced <- nba_advanced[!duplicated(nba_advanced$Player), ]
nba_per_game <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/nba2021_per_game.csv")
nba_per_game <- nba_per_game %>% 
  mutate(Pos = recode(Pos, 'F-C'='C', 'SF-PF'='SF', 'G'='PG', 'F'='PF', ))
nba_per_game$Pos <- factor(nba_per_game$Pos,
                              levels = c("PG", "SG", "SF", "PF", "C"))


nba_team_stats <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/nba_team_stats_00_to_21.csv")
nba_team_stats <- nba_team_stats %>% 
  filter(SEASON == "2020-21") %>% 
  rename("team" = "TEAM")
payroll <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/NBA Payroll(1990-2023).csv")
payroll <- payroll %>% 
  filter(seasonStartYear == 2020) %>% 
  subset(select = c("team", "payroll"))

# convert payroll to int
payroll$payroll <-gsub("[^0-9.]", "", payroll$payroll)
payroll$payroll <- as.integer(payroll$payroll)


salaries <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/NBA Salaries(1990-2023).csv")
salaries <- salaries %>% 
  filter(seasonStartYear == 2020) %>% 
  rename("Player" = "playerName") %>% 
  subset(select = c("Player", "salary"))

# convert salary to int
salaries$salary <- gsub("[^0-9.]", "", salaries$salary)
salaries$salary <- as.integer(salaries$salary)


country <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/nba_all_teams.csv")
country <- country %>% 
  rename("Player" = "Player Name") %>% 
  subset(select = c("Player", "COUNTRY"))

height_and_weight <- read_csv("/Users/christopherbussen/Documents/School/UDS2023/MTH209/finalProject/all_seasons.csv")
height_and_weight <- height_and_weight %>% 
  rename("Player" = "player_name",
         "height" = "player_height",
         "weight" = "player_weight") %>% 
  subset(select = c("Player", "height", "weight"))
# convert to inches
height_and_weight$height <- height_and_weight$height / 2.54
# convert to lbs
height_and_weight$weight <- height_and_weight$weight * 2.20462
height_and_weight$weight <- round(height_and_weight$weight, 0)

# add country to dataset
nba_per_game <- nba_per_game %>% 
  left_join(country, by = "Player")

# add salary to dataset
nba_per_game <- nba_per_game %>% 
  left_join(salaries, by = "Player")

nba_advanced <- nba_advanced %>% 
  left_join(salaries, by = "Player")

# create mpg for nba advanced
nba_advanced <- nba_advanced %>% 
  mutate(MPG = MP / G)
nba_advanced$MPG <- round(nba_advanced$MPG, 1)

# add height and weight to dataset and get rid of duplicate players
nba_per_game <- nba_per_game %>% 
  left_join(height_and_weight, by = "Player")
nba_per_game <- nba_per_game[!duplicated(nba_per_game$Player), ]

nba_team_stats <- nba_team_stats %>% 
  left_join(payroll, by = "team")

nba_team_stats <- select(nba_team_stats,-teamstatspk, -SEASON)

addCountry <- filter(nba_per_game, is.na(COUNTRY))
```

Introduction
===

Column {.tabset data-width=650}
-----------------------------------------------------------------------

### Basic Info

<font size = 5>
**An Analysis of the Correlation Between NBA Statistics**
</font>

The purpose of this study is to analyze the National Basketball Association and how different statistics, both individual and team, affect each other. In addition to some of the more basic statistics, such as points and assists, I will also study more advanced statistics that have become more significant as NBA teams become more data-oriented. These statistics guide NBA teams and players daily, therefore having an enhanced knowledge on the impact they have can give people important insight into why they make the decisions they do.

I will be using 4 main datasets throughout this presentation (some of which have been combined from more datasets to add in several variables such as player countries, heights, weights, etc.). The player statistics datasets I will be using are based off of the 2020-21 NBA season. They contain data for 481 NBA players and include their per game stats and advanced stats. The last dataset I will be using is for team statistics during the 2020-21 season.

This presentation is tasked with answering some of the following questions:

- What positions tend to have better stats in each of the main statistical categories? 

- How does height/weight affect offensive and defensive stats in general (could use OBPM, DBPM, or basic per game stats), if at all?

- How does a team’s payroll affect their overall performance?

### Glimpse of Per Game

```{r}
glimpse(nba_per_game)
```

### Glimpse of Advanced

```{r}
glimpse(nba_advanced)
```

### Glimpse of Team Stats

```{r}
glimpse(nba_team_stats)
```


Column {data-width=350}
-----------------------------------------------------------------------

### Explanation of Variables


Player Overview
===

Column {.tabset}
-----

### Per Game Table

```{r pg table}
DT::datatable(nba_per_game[,1:32], rownames = FALSE, 
              options = list(columnDefs = list(list(className = 'dt-center', targets = 1:31))))
```

### Advanced Table
```{r advanced table}
DT::datatable(nba_advanced[,1:26], rownames = FALSE, 
              options = list(columnDefs = list(list(className = 'dt-center', targets = 1:25))))
```

### Team Stats Table
```{r team table}
DT::datatable(nba_team_stats[,1:28], rownames = FALSE, 
              options = list(columnDefs = list(list(className = 'dt-center', targets = 1:27))))
```

### Birthplaces

```{r world map 1, echo=FALSE}
world <- map_data("world")

count <- nba_per_game %>% 
  group_by(COUNTRY) %>% 
  summarize(count = n())

birthplaces <- count %>% 
  left_join(world, by = c("COUNTRY" = "region"))

# need to use map and another first geom_polygon to plot the world map by itself 
# this way map still shows up in areas where there are no players
p1 <- world %>% 
  ggplot() +  
  geom_polygon(aes(x=long, y=lat, group=group, text = region), fill = "grey", alpha=0.5) +
  geom_polygon(data = birthplaces, aes(x=long, y=lat, group=group, fill = count,  text = paste0(COUNTRY, ":\n", count, " NBA Player(s)"))) +
  scale_fill_viridis_c(option = "H") +
  theme_void() + 
  labs(title = "NBA Players Birthplaces")

ggplotly(p1, tooltip = "text")
```

### Birthplaces (filtered)

In this heatmap, I have filtered out all of the players from the US so that we can better see the number of players from other countries.

```{r world map 2}
birthplaces <- filter(birthplaces, COUNTRY != "USA")
p2 <- world %>% 
  ggplot() +  
  geom_polygon(aes(x = long, y = lat, group = group, text = region), fill = "grey", alpha=0.5) +
  geom_polygon(data = birthplaces, aes(x = long, y = lat, group = group, fill = count,  text = paste0(COUNTRY, ":\n", count, " NBA Player(s)"))) +
  scale_fill_viridis_c(option = "H") +
  theme_void() + 
  labs(title = "NBA Players Birthplaces")

ggplotly(p2, tooltip = "text")
```


Height
===

Column {.tabset data-width=850 .no-padding}
-----

### Points

```{r}
# create height group
nba_per_game$height_group <- cut(nba_per_game$height, breaks = c(66,75,79,83, Inf), labels = c( "<6'4","6'4-6'7","6'8-6'11","6'11+"))

over10ppg <- filter(nba_per_game, nba_per_game$PTS > 10)
ptsH <- ggplot(over10ppg, aes(x = height, y = PTS)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(70, 85, by=3), limits = c(70, 85)) + 
  labs(title="Distribution of PPG Based on Height", x="Height (in.)", y="Points") + 
  theme(plot.title = element_text(hjust = 0.5))

ptsHGroup <- ggplot(na.omit(over10ppg), aes(x = height_group, y = PTS)) + 
  geom_boxplot(fill = "#2196f3") + 
  labs(title ="", x="Height Group", y = NULL) + 
  theme(text = element_text(size = 10))

grid.arrange(ptsH, ptsHGroup, ncol = 2, widths = c(1.9, 1))

```

Removed under 10 ppg because every height had a lot of entries with low ppg

### Assists

```{r}
over2ast <- filter(nba_per_game, nba_per_game$AST > 2)
astsH <- ggplot(over2ast, aes(x = height, y = AST)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(70, 85, by=3), limits = c(70, 85)) + 
  labs(title="Distribution of APG Based on Height", x="Height (in.)", y="Assists") + 
  theme(plot.title = element_text(hjust = 0.5))

astsHGroup <- ggplot(na.omit(over2ast), aes(x = height_group, y = AST)) + 
  geom_boxplot(fill = "#2196f3") + 
  labs(title ="", x="Height Group", y = NULL) + 
  theme(text = element_text(size = 10))

grid.arrange(astsH, astsHGroup, ncol = 2, widths = c(1.9, 1))
```


### Rebounds

```{r}
over3rb <- filter(nba_per_game, nba_per_game$TRB > 3)
rbsH <- ggplot(over3rb, aes(x = height, y = TRB)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(70, 90, by=4), limits = c(70, 90)) + 
  labs(title="Distribution of RPG Based on Height", x="Height (in.)", y="Rebounds") + 
  theme(plot.title = element_text(hjust = 0.5))

rbsHGroup <- ggplot(na.omit(over3rb), aes(x = height_group, y = TRB)) + 
  geom_boxplot(fill = "#2196f3") + 
  labs(title ="", x="Height Group", y = NULL) + 
  theme(text = element_text(size = 10))

grid.arrange(rbsH, rbsHGroup, ncol = 2, widths = c(1.9, 1))
```



### Blocks

```{r}
blkH <- ggplot(nba_per_game, aes(x = height, y = BLK)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(70, 90, by=4), limits = c(70, 90)) + 
  labs(title="Distribution of BPG Based on Height", x="Height (in.)", y="Blocks") + 
  theme(plot.title = element_text(hjust = 0.5))

blkHGroup <- ggplot(na.omit(nba_per_game), aes(x = height_group, y = BLK)) + 
  geom_boxplot(fill = "#2196f3") + 
  labs(title ="", x="Height Group", y = NULL) + 
  theme(text = element_text(size = 10))

grid.arrange(blkH, blkHGroup, ncol = 2, widths = c(1.9, 1))
```


### Steals

```{r}
stlH <- ggplot(nba_per_game, aes(x = height, y = STL)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(70, 90, by=4), limits = c(70, 90)) + 
  labs(title="Distribution of STL Based on Height", x="Height (in.)", y="Steals") + 
  theme(plot.title = element_text(hjust = 0.5))

stlHGroup <- ggplot(na.omit(nba_per_game), aes(x = height_group, y = STL)) + 
  geom_boxplot(fill = "#2196f3") + 
  labs(title ="", x="Height Group", y = NULL) + 
  theme(text = element_text(size = 10))

grid.arrange(stlH, stlHGroup, ncol = 2, widths = c(1.9, 1))
```



### FG%

```{r}
fgH <- ggplot(nba_per_game, aes(x = height, y = `FG%`)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(70, 90, by=4), limits = c(70, 90)) + 
  labs(title="Distribution of FG% Based on Height", x="Height (in.)", y="FG%") + 
  theme(plot.title = element_text(hjust = 0.5))

fgHGroup <- ggplot(na.omit(nba_per_game), aes(x = height_group, y = `FG%`)) + 
  geom_boxplot(fill = "#2196f3") + 
  labs(title ="", x="Height Group", y = NULL) + 
  theme(text = element_text(size = 10))

grid.arrange(fgH, fgHGroup, ncol = 2, widths = c(1.9, 1))

```



Column
-----------------------------------------------------------------------

### Height Analysis


Weight
===
Column {.tabset data-width=650}
-----

### Points 

```{r}
ggplot(over10ppg, aes(x = weight, y = PTS)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(170, 300, by=25), limits = c(170, 300)) + 
  labs(title="Distribution of PPG Based on Weight", x="Weight (lbs.)", y="Points") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Assists

```{r}
ggplot(over2ast, aes(x = weight, y = AST)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(170, 300, by=25), limits = c(170, 300)) + 
  labs(title="Distribution of APG Based on Weight", x="Weight (lbs.)", y="Assists") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Rebounds

```{r}
ggplot(over3rb, aes(x = weight, y = TRB)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(170, 300, by=25), limits = c(170, 300)) + 
  labs(title="Distribution of RPG Based on Weight", x="Weight (lbs.)", y="Rebounds") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Blocks

```{r}
ggplot(nba_per_game, aes(x = weight, y = BLK)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(160, 315, by=25), limits = c(160, 315)) + 
  labs(title="Distribution of BPG Based on Weight", x="Weight (lbs.)", y="Blocks") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Steals

```{r}
ggplot(nba_per_game, aes(x = weight, y = STL)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(160, 315, by=25), limits = c(160, 315)) + 
  labs(title="Distribution of STL Based on Weight", x="Weight (lbs.)", y="Steals") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### FG%

```{r}
ggplot(nba_per_game, aes(x = weight, y = `FG%`)) + 
  geom_point(col = "#2196f3") + 
  scale_x_continuous(breaks = seq(160, 315, by=25), limits = c(160, 315)) + 
  labs(title="Distribution of FG% Based on Weight", x="Weight (lbs.)", y="FG%") + 
  theme(plot.title = element_text(hjust = 0.5))
```


Column
---

### Weight Analysis

Position Analysis
===

Column {.tabset data_width=650}
---

### Points

```{r}
ggplot(nba_per_game, aes(x = Pos, y = PTS)) + 
  geom_boxplot(fill = "#2196f3") + 
  scale_y_continuous(breaks = seq(0, 35, by=5), limits = c(0, 35)) + 
  labs(title="Effect of Position on PPG", x="Position", y="Points") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Assists

```{r}
ggplot(nba_per_game, aes(x = Pos, y = AST)) + 
  geom_boxplot(fill = "#2196f3") + 
  scale_y_continuous(breaks = seq(0, 12, by=2), limits = c(0, 12)) + 
  labs(title="Effect of Position on APG", x="Position", y="Assist") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Rebounds

```{r}
ggplot(nba_per_game, aes(x = Pos, y = TRB)) + 
  geom_boxplot(fill = "#2196f3") + 
  scale_y_continuous(breaks = seq(0, 15, by=3), limits = c(0, 15)) + 
  labs(title="Effect of Position on RPG", x="Position", y="Rebounds") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Steals

```{r}
ggplot(nba_per_game, aes(x = Pos, y = STL)) + 
  geom_boxplot(fill = "#2196f3")  + 
  scale_y_continuous(breaks = seq(0, 2, by=.5), limits = c(0, 2)) + 
  labs(title="Effect of Position on STL", x="Position", y="Steals") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### Blocks

```{r}
ggplot(nba_per_game, aes(x = Pos, y = BLK)) + 
  geom_boxplot(fill = "#2196f3")  + 
  scale_y_continuous(breaks = seq(0, 3.5, by=.5), limits = c(0, 3.5)) + 
  labs(title="Effect of Position on BPG", x="Position", y="Blocks") + 
  theme(plot.title = element_text(hjust = 0.5))
```


### FG%

```{r}
ggplot(nba_per_game, aes(x = Pos, y = `FG%`)) + 
  geom_boxplot(fill = "#2196f3") + 
  scale_y_continuous(breaks = seq(0, 1, by=.25), limits = c(0, 1)) + 
  labs(title="Effect of Position on FG%", x="Position", y="FG%") + 
  theme(plot.title = element_text(hjust = 0.5))
```

### Avg. Salary

```{r avg salary ~ pos}
avgPosSalary <- over10ppg %>% 
  group_by(Pos) %>% 
  summarise(
    avgSalary = mean(salary, na.rm = T)
  )

avgSalaries <- ggplot(avgPosSalary, aes(x = Pos, y = avgSalary)) + 
  geom_col(fill = "#2196f3", aes(text = paste0("Position: ", Pos, "\nAverage Salary: $", round(avgSalary,0)))) + 
  labs(title="Average Salary by Position", x="Position", y="Average Salary ($)") + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_y_continuous(breaks = seq(0, 20000000, by=4000000), limits = c(0, 20000000), labels = scales::comma)

ggplotly(avgSalaries, tooltip = "text")
```

used over 10ppg - didn't want to bring it down with people who don't actually play

Column
---

### Analysis



Team Analysis
===

Column {.tabset data_width=650}
---

### Payroll vs. Wins

```{r payroll scatter}
payroll <- ggplot(nba_team_stats, aes(x = payroll, y = `WIN%`)) + 
  geom_point(col = "#2196f3", aes(text = paste0("Team: ", nba_team_stats$team, "\nPayroll: $", payroll, "\nWin %: ", nba_team_stats$`WIN%`))) + 
  scale_x_continuous(breaks = seq(90000000, 175000000, by=20000000), limits = c(90000000, 175000000), labels = scales::comma) + 
  scale_y_continuous(breaks = seq(0, 1, by=.25), limits = c(0,1)) + 
  labs(title="Effect of Payroll on Win Percentage", x="Payroll ($)", y="Win %") + 
  theme(plot.title = element_text(hjust = 0.5))

ggplotly(payroll, tooltip = "text")
```


### PPG vs. Wins

```{r ppg scatter}
ppg <- ggplot(nba_team_stats, aes(x = PTS, y = `WIN%`, label = team)) + 
  geom_point(col = "#2196f3", aes(text = paste0("Team: ", nba_team_stats$team, "\nPoints Per Game: ", PTS, "\nWin %: ", nba_team_stats$`WIN%`))) + 
  scale_x_continuous(breaks = seq(100, 115, by=3), limits = c(100, 115)) +
  scale_y_continuous(breaks = seq(0, 1, by=.25), limits = c(0,1)) +
  labs(title="Effect of Points Per Game on Win Percentage", x="PPG", y="Win %") + 
  theme(plot.title = element_text(hjust = 0.5))

ggplotly(ppg, tooltip = "text")
```


### 3P% vs. Wins

```{r 3p scatter}
threePct <- ggplot(nba_team_stats, aes(x = `3P%`, y = `WIN%`, label = team)) + 
  geom_point(col = "#2196f3", aes(text = paste0("Team: ", nba_team_stats$team, "\n3P%: ", `3P%`, "\n3PM: ", nba_team_stats$`3PM`, "\n3PA: ", nba_team_stats$`3PA`, "\nWin %: ", nba_team_stats$`WIN%`))) + 
  scale_x_continuous(breaks = seq(30, 40, by=2), limits = c(30, 40)) + 
  labs(title="Effect of 3 Point Percentage on Win Percentage", x="3P%", y="Win %") + 
  theme(plot.title = element_text(hjust = 0.5))

ggplotly(threePct, tooltip = "text")
```



Column
---

### Analysis


Advanced Stats
===

Column {.tabset data-width=650}
---

### Highest WS

```{r}
best50 <- nba_advanced %>% 
  arrange(desc(WS)) %>% 
  slice(1:50)

DT::datatable(best50[,1:28], rownames = FALSE, 
              options = list(columnDefs = list(list(className = 'dt-center', targets = 1:27))))
```


### TS% vs. PER

```{r}

ppg <- ggplot(best50, aes(x = `TS%`, y = PER, label = Player)) + 
  geom_point(col = "#2196f3", aes(text = paste0("Player: ", best50$Player, "\nPosition: ", best50$Pos, "\nTrue Shooting %: ", `TS%`, "\nPER: ", best50$PER, "\nMPG: ", best50$MPG))) + 
  scale_x_continuous(breaks = seq(0.5, 0.75, by=.05), limits = c(0.5, 0.75)) +
  scale_y_continuous(breaks = seq(0, 35, by=5), limits = c(0,35)) +
  labs(title="Relationship between TS% and PER", x="TS%", y="PER") + 
  theme(plot.title = element_text(hjust = 0.5))

ggplotly(ppg, tooltip = "text")
```

### WS/48 vs. Minutes

Column
---

### Analysis

decided to break down into top ws because it ranks overall play and takes into account minutes played - wanted to get rid of people who don't play much

Conclusion
===

Column {data-length=650}
---

### Results


### Limitations
One of the major limitations of this study is that the best dataset I could find was from the 2020-21 NBA season. There were other NBA datasets that I could have used, yet this was the only one that included the advanced stats that many analysts look at today. While this is still recent enough to provide insight on the NBA today, one more limitation is that the dataset was created in the middle of the 20-21 season. As a result, the sample size for the players' statistics is only part of a season.

### References 
https://www.kaggle.com/datasets/umutalpaydn/nba-20202021-season-player-stats

https://www.kaggle.com/datasets/justinas/nba-players-data

https://www.kaggle.com/datasets/loganlauton/nba-players-and-team-data?select=NBA+Payroll%281990-2023%29.csv


About the Author
===

Column {data-width = 650}
---

### About Me
My name is Christopher Bussen and I am an undergraduate student at the University of Dayton. I am currently working towards my B.S. in Computer Science with minors in Mathematics and Data Analytics and am on track to graduate in May 2024.

After graduation, I am interested in pursuing full-time employment in a data analytics position, especially one that allows me to combine my love of sports and math. 

I have exposure to Google Analytics, SPSS, SQL, Golang, Tableau, pandas, and Git, and I am proficient in Java, Python, R, HTML, CSS, and MS 365 applications.

Please connect with me on LinkedIn [here](https://www.linkedin.com/in/christopherbussen/).

Column {.tabset data-width = 600}
---

### Picture

```{r , fig.width=6, echo=FALSE, fig.cap="Christopher Bussen", fig.align='center'}
knitr::include_graphics("headshot.jpeg")
```